Network Stock Portfolio Optimization¶

Screenshot 2022-02-10 at 9.04.02 PM.png

Context and Problem Statement¶

Active investing in the asset management industry aims to beat the stock market’s average returns, for which portfolio managers track a particular index and try to beat that index by creating their own portfolios.

Portfolio construction involves selection of stocks that have a higher probability of giving better returns in comparison to the tracking index, like S&P 500. In this project, we will use the concept of Network Analysis to select a basket of stocks and create two portfolios. We will then simulate portfolio value by investing a certain amount, keeping the portfolio for an entire year and we will then compare it against the S&P 500 index.

In this project we will try to follow the approach mentioned in the below research paper:

Dynamic portfolio strategy using a clustering approach

Proposed Approach¶

  • Collect the price data for all S&P 500 components from 2011 till 2020
  • Compute log returns for the S&P 500 components for same time period
  • Compute the correlation matrix for the above log returns
  • Find out the Top n central and peripheral stocks based on the following network topological parameters:
    • Degree centrality
    • Betweenness centrality
    • Distance on degree criterion
    • Distance on correlation criterion
    • Distance on distance criterion
  • Simulate the performance of central and peripheral portfolios against the performance of S&P 500 for the year 2021

Loading the Libraries¶

We will need to first install the library - pandas_datareader using !pip install pandas_datareader

In [27]:
!pip install pandas_datareader
Requirement already satisfied: pandas_datareader in c:\users\max power\anaconda3\lib\site-packages (0.10.0)
Requirement already satisfied: lxml in c:\users\max power\anaconda3\lib\site-packages (from pandas_datareader) (5.3.0)
Requirement already satisfied: pandas>=0.23 in c:\users\max power\anaconda3\lib\site-packages (from pandas_datareader) (2.2.3)
Requirement already satisfied: requests>=2.19.0 in c:\users\max power\anaconda3\lib\site-packages (from pandas_datareader) (2.32.3)
Requirement already satisfied: numpy>=1.26.0 in c:\users\max power\anaconda3\lib\site-packages (from pandas>=0.23->pandas_datareader) (2.1.3)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\max power\anaconda3\lib\site-packages (from pandas>=0.23->pandas_datareader) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in c:\users\max power\anaconda3\lib\site-packages (from pandas>=0.23->pandas_datareader) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in c:\users\max power\anaconda3\lib\site-packages (from pandas>=0.23->pandas_datareader) (2025.2)
Requirement already satisfied: six>=1.5 in c:\users\max power\anaconda3\lib\site-packages (from python-dateutil>=2.8.2->pandas>=0.23->pandas_datareader) (1.17.0)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\max power\anaconda3\lib\site-packages (from requests>=2.19.0->pandas_datareader) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in c:\users\max power\anaconda3\lib\site-packages (from requests>=2.19.0->pandas_datareader) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\max power\anaconda3\lib\site-packages (from requests>=2.19.0->pandas_datareader) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\max power\anaconda3\lib\site-packages (from requests>=2.19.0->pandas_datareader) (2025.10.5)
In [29]:
import tqdm
import requests
import numpy as np
import pandas as pd
import seaborn as sns
import networkx as nx
import plotly.express as px
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt
import pandas_datareader.data as web
import os

import warnings
warnings.filterwarnings('ignore')

Getting the S&P 500 Components¶

Beautiful Soup is a library that makes it easy to scrape information from web pages.

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

In [31]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

#Find tikers of companies in S&P500 from Wikipwdia
url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'

# Always include a User-Agent to avoid being blocked
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36'}
resp = requests.get(url, headers=headers)

# Parse with BeautifulSoup
soup = BeautifulSoup(resp.text, 'lxml')

# Verify which tables are present
tables = soup.find_all('table')
print(f"Found {len(tables)} tables on the page")

# The S&P 500 table is usually the first one
table = tables[0]

# Convert to DataFrame for convenience
df = pd.read_html(str(table))[0]

# Extract tickers
tickers = df['Symbol'].str.replace('.', '-', regex=False).tolist()

print(f"Extracted {len(tickers)} tickers.")

#show first 15 elements
print(tickers[:15])
Found 2 tables on the page
Extracted 503 tickers.
['MMM', 'AOS', 'ABT', 'ABBV', 'ACN', 'ADBE', 'AMD', 'AES', 'AFL', 'A', 'APD', 'ABNB', 'AKAM', 'ALB', 'ARE']

Getting the Price Data for all the S&P 500 components in the last 10 years¶

In [6]:
# WGet a new dataset from Wikipedia using Yahoo
# price_data = web.DataReader(tickers, 'yahoo', start='2011-01-01', end='2020-12-31')  
# price_data = price_data['Adj Close']    # we will get all the data points and we also get the volume not only the close price, open price
# price_data.to_csv('snp500_price_data_2011_to_2020.csv')
In [32]:
# Build path to file saved in Data fodler
main_dir = os.path.dirname(current_path)
file_path = os.path.join(main_dir, "Data", "snp500_price_data_2011_to_2020.csv")

df = pd.read_csv(file_path, index_col=[0])

print(df.head())
                  MMM       AOS        ABT  ABBV   ABMD        ACN       ATVI  \
Date                                                                            
2010-12-31  63.855606  8.113162  17.986767   NaN   9.61  39.143620  11.245819   
2011-01-03  64.218163  8.125947  17.952976   NaN   9.80  39.224346  11.318138   
2011-01-04  64.129395  8.100383  18.121916   NaN   9.80  38.966022  11.327178   
2011-01-05  64.129395  8.285738  18.121916   NaN  10.03  38.974094  11.110217   
2011-01-06  63.737186  8.289999  18.084377   NaN  10.05  39.119389  11.083097   

                  ADM       ADBE        ADP  ...        XEL       XLNX  XYL  \
Date                                         ...                              
2010-12-31  22.385578  30.780001  31.271172  ...  16.221039  23.216919  NaN   
2011-01-03  22.623722  31.290001  31.791464  ...  16.227924  23.569420  NaN   
2011-01-04  22.608845  31.510000  31.676586  ...  16.296804  23.665552  NaN   
2011-01-05  22.713020  32.220001  32.183357  ...  16.200367  23.745670  NaN   
2011-01-06  23.583750  32.270000  32.433369  ...  16.186602  24.146229  NaN   

                  YUM       ZBRA        ZBH       ZION  ZTS  CEG  OGN  
Date                                                                   
2010-12-31  28.547478  37.990002  49.212429  21.089169  NaN  NaN  NaN  
2011-01-03  28.570745  38.200001  50.395058  21.907326  NaN  NaN  NaN  
2011-01-04  28.134232  37.840000  49.725819  21.550467  NaN  NaN  NaN  
2011-01-05  28.268110  37.799999  49.762482  21.672325  NaN  NaN  NaN  
2011-01-06  28.465996  37.480000  48.222305  21.611391  NaN  NaN  NaN  

[5 rows x 505 columns]
In [33]:
df.head()
Out[33]:
MMM AOS ABT ABBV ABMD ACN ATVI ADM ADBE ADP ... XEL XLNX XYL YUM ZBRA ZBH ZION ZTS CEG OGN
Date
2010-12-31 63.855606 8.113162 17.986767 NaN 9.61 39.143620 11.245819 22.385578 30.780001 31.271172 ... 16.221039 23.216919 NaN 28.547478 37.990002 49.212429 21.089169 NaN NaN NaN
2011-01-03 64.218163 8.125947 17.952976 NaN 9.80 39.224346 11.318138 22.623722 31.290001 31.791464 ... 16.227924 23.569420 NaN 28.570745 38.200001 50.395058 21.907326 NaN NaN NaN
2011-01-04 64.129395 8.100383 18.121916 NaN 9.80 38.966022 11.327178 22.608845 31.510000 31.676586 ... 16.296804 23.665552 NaN 28.134232 37.840000 49.725819 21.550467 NaN NaN NaN
2011-01-05 64.129395 8.285738 18.121916 NaN 10.03 38.974094 11.110217 22.713020 32.220001 32.183357 ... 16.200367 23.745670 NaN 28.268110 37.799999 49.762482 21.672325 NaN NaN NaN
2011-01-06 63.737186 8.289999 18.084377 NaN 10.05 39.119389 11.083097 23.583750 32.270000 32.433369 ... 16.186602 24.146229 NaN 28.465996 37.480000 48.222305 21.611391 NaN NaN NaN

5 rows × 505 columns

Missing Data due to Index Rebalancing¶

In [42]:
# Identify stocks with missing data
figure = plt.figure(figsize=(16, 8))
sns.heatmap(df.T.isnull());
No description has been provided for this image

The missing data is due to the fact that certain stocks may move out of the S&P 500 at some point, ni that case other stocks may enter the S&P 500 to replacxe the first ones.

Clean Dataset from Null values

In [43]:
price_data_cleaned = df.dropna(axis=1) # dropping null values columnwise
In [38]:
figure = plt.figure(figsize=(16, 8))
sns.heatmap(price_data_cleaned.T.isnull());
No description has been provided for this image

The null values are removed - the data is clean and the plot also helps in finding that there are no missing values.

Getting Yearwise Data¶

In [39]:
def get_year_wise_snp_500_data(data, year):
    year_wise_data = data.loc['{}-01-01'.format(year):'{}-12-31'.format(year)]

    return year_wise_data
In [44]:
# Getting year wise data of S&P stocks from 2011 to 2020 -> divide df into 1 df per year
snp_500_2011 = get_year_wise_snp_500_data(price_data_cleaned, 2011)
snp_500_2012 = get_year_wise_snp_500_data(price_data_cleaned, 2012)
snp_500_2013 = get_year_wise_snp_500_data(price_data_cleaned, 2013)
snp_500_2014 = get_year_wise_snp_500_data(price_data_cleaned, 2014)
snp_500_2015 = get_year_wise_snp_500_data(price_data_cleaned, 2015)
snp_500_2016 = get_year_wise_snp_500_data(price_data_cleaned, 2016)
snp_500_2017 = get_year_wise_snp_500_data(price_data_cleaned, 2017)
snp_500_2018 = get_year_wise_snp_500_data(price_data_cleaned, 2018)
snp_500_2019 = get_year_wise_snp_500_data(price_data_cleaned, 2019)
snp_500_2020 = get_year_wise_snp_500_data(price_data_cleaned, 2020)
In [45]:
snp_500_2011
Out[45]:
MMM AOS ABT ABMD ACN ATVI ADM ADBE ADP AAP ... WHR WMB WTW WYNN XEL XLNX YUM ZBRA ZBH ZION
Date
2011-01-03 64.218163 8.125947 17.952976 9.800000 39.224346 11.318138 22.623722 31.290001 31.791464 62.732765 ... 67.926483 11.205445 93.139076 77.632828 16.227924 23.569420 28.570745 38.200001 50.395058 21.907326
2011-01-04 64.129395 8.100383 18.121916 9.800000 38.966022 11.327178 22.608845 31.510000 31.676586 59.610516 ... 66.950417 11.123855 91.576157 80.054619 16.296804 23.665552 28.134232 37.840000 49.725819 21.550467
2011-01-05 64.129395 8.285738 18.121916 10.030000 38.974094 11.110217 22.713020 32.220001 32.183357 59.687115 ... 67.400887 11.141987 92.847679 81.087433 16.200367 23.745670 28.268110 37.799999 49.762482 21.672325
2011-01-06 63.737186 8.289999 18.084377 10.050000 39.119389 11.083097 23.583750 32.270000 32.433369 57.723724 ... 65.966843 11.119320 93.112579 81.678650 16.186602 24.146229 28.465996 37.480000 48.222305 21.611391
2011-01-07 63.803787 8.409311 18.159452 9.890000 39.183979 10.938456 23.777237 32.040001 32.507687 59.265705 ... 65.689018 11.296103 92.953644 84.570557 16.331245 24.010040 28.821018 37.599998 48.213131 21.385098
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2011-12-23 62.401131 8.806977 21.821796 18.469999 43.558022 11.196037 22.084383 28.290001 37.792553 67.537323 ... 39.581589 15.226687 103.337746 82.537613 19.524267 26.499556 35.055527 36.520000 48.781536 14.242986
2011-12-27 62.461864 8.845891 21.903597 18.360001 43.590965 11.168506 22.069183 28.500000 37.869064 68.181442 ... 36.047947 15.277890 103.761589 85.186325 19.825743 26.450407 35.215870 36.590000 48.845718 14.304041
2011-12-28 61.604034 8.612420 21.747778 18.250000 43.533318 11.131798 21.560009 28.020000 37.423878 67.546921 ... 35.854633 14.956691 102.516556 81.952354 19.710886 26.188274 35.025822 35.700001 48.735703 14.033657
2011-12-29 62.332783 8.837245 21.942547 18.379999 44.340408 11.287809 21.841192 28.309999 37.806461 67.633461 ... 36.589203 15.152204 102.966888 82.792717 19.890350 26.409443 35.382153 35.980000 48.992397 14.373814
2011-12-30 62.044331 8.672951 21.903597 18.469999 43.838036 11.306164 21.734802 28.270000 37.569965 66.941238 ... 36.689724 15.370993 102.781456 82.905281 19.840096 26.262003 35.043640 35.779999 48.974060 14.199376

252 rows × 450 columns

In [15]:
snp_500_2011.shift(1)
Out[15]:
MMM AOS ABT ABMD ACN ATVI ADM ADBE ADP AAP ... WHR WMB WTW WYNN XEL XLNX YUM ZBRA ZBH ZION
Date
2011-01-03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2011-01-04 64.218163 8.125947 17.952976 9.800000 39.224346 11.318138 22.623722 31.290001 31.791464 62.732765 ... 67.926483 11.205445 93.139076 77.632828 16.227924 23.569420 28.570745 38.200001 50.395058 21.907326
2011-01-05 64.129395 8.100383 18.121916 9.800000 38.966022 11.327178 22.608845 31.510000 31.676586 59.610516 ... 66.950417 11.123855 91.576157 80.054619 16.296804 23.665552 28.134232 37.840000 49.725819 21.550467
2011-01-06 64.129395 8.285738 18.121916 10.030000 38.974094 11.110217 22.713020 32.220001 32.183357 59.687115 ... 67.400887 11.141987 92.847679 81.087433 16.200367 23.745670 28.268110 37.799999 49.762482 21.672325
2011-01-07 63.737186 8.289999 18.084377 10.050000 39.119389 11.083097 23.583750 32.270000 32.433369 57.723724 ... 65.966843 11.119320 93.112579 81.678650 16.186602 24.146229 28.465996 37.480000 48.222305 21.611391
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2011-12-23 61.467400 8.679438 21.677664 18.520000 43.302727 10.920726 21.810799 27.889999 37.444744 66.700890 ... 39.094479 15.007899 101.907288 81.066956 19.380699 26.450407 34.675449 36.380001 48.625694 14.085989
2011-12-27 62.401131 8.806977 21.821796 18.469999 43.558022 11.196037 22.084383 28.290001 37.792553 67.537323 ... 39.581589 15.226687 103.337746 82.537613 19.524267 26.499556 35.055527 36.520000 48.781536 14.242986
2011-12-28 62.461864 8.845891 21.903597 18.360001 43.590965 11.168506 22.069183 28.500000 37.869064 68.181442 ... 36.047947 15.277890 103.761589 85.186325 19.825743 26.450407 35.215870 36.590000 48.845718 14.304041
2011-12-29 61.604034 8.612420 21.747778 18.250000 43.533318 11.131798 21.560009 28.020000 37.423878 67.546921 ... 35.854633 14.956691 102.516556 81.952354 19.710886 26.188274 35.025822 35.700001 48.735703 14.033657
2011-12-30 62.332783 8.837245 21.942547 18.379999 44.340408 11.287809 21.841192 28.309999 37.806461 67.633461 ... 36.589203 15.152204 102.966888 82.792717 19.890350 26.409443 35.382153 35.980000 48.992397 14.373814

252 rows × 450 columns

Computing the Daily Log Returns¶

Statistically, simple stock returns are always assumed to follow a Log Normal distribution. It is therefore plausible to use properties of the Normal distribution in statistical estimation for Log returns, but not for the simple returns.

Stock Returns analysis is a time series analysis, in which you also take care of stationarity which is normally obtained from Log returns but not from simple returns.

In [46]:
# Calculating daily log returns by subtracting between two days with the help of shift function
log_returns_2011 = np.log(snp_500_2011.shift(1)) - np.log(snp_500_2011)
log_returns_2012 = np.log(snp_500_2012.shift(1)) - np.log(snp_500_2012)
log_returns_2013 = np.log(snp_500_2013.shift(1)) - np.log(snp_500_2013)
log_returns_2014 = np.log(snp_500_2014.shift(1)) - np.log(snp_500_2014)
log_returns_2015 = np.log(snp_500_2015.shift(1)) - np.log(snp_500_2015)
log_returns_2016 = np.log(snp_500_2016.shift(1)) - np.log(snp_500_2016)
log_returns_2017 = np.log(snp_500_2017.shift(1)) - np.log(snp_500_2017)
log_returns_2018 = np.log(snp_500_2018.shift(1)) - np.log(snp_500_2018)
log_returns_2019 = np.log(snp_500_2019.shift(1)) - np.log(snp_500_2019)
log_returns_2020 = np.log(snp_500_2020.shift(1)) - np.log(snp_500_2020)

Computing the Correlation of Returns¶

See which stocks move together (positive correlations are indicated by light colors)

In [53]:
# Computing adjacency matrix:
return_correlation_2011 = log_returns_2011.corr()
return_correlation_2012 = log_returns_2012.corr()
return_correlation_2013 = log_returns_2013.corr()
return_correlation_2014 = log_returns_2014.corr()
return_correlation_2015 = log_returns_2015.corr()
return_correlation_2016 = log_returns_2016.corr()
return_correlation_2017 = log_returns_2017.corr()
return_correlation_2018 = log_returns_2018.corr()
return_correlation_2019 = log_returns_2019.corr()
return_correlation_2020 = log_returns_2020.corr()
In [55]:
figure, axes = plt.subplots(5, 2, figsize=(30, 30))
sns.heatmap(return_correlation_2011, ax=axes[0, 0])
axes[0, 0].set_title("Return Correlation - 2011")
sns.heatmap(return_correlation_2012, ax=axes[0, 1]);
sns.heatmap(return_correlation_2013, ax=axes[1, 0]);
sns.heatmap(return_correlation_2014, ax=axes[1, 1]);
sns.heatmap(return_correlation_2015, ax=axes[2, 0]);
sns.heatmap(return_correlation_2016, ax=axes[2, 1]);
sns.heatmap(return_correlation_2017, ax=axes[3, 0]);
sns.heatmap(return_correlation_2018, ax=axes[3, 1]);
sns.heatmap(return_correlation_2019, ax=axes[4, 0]);
sns.heatmap(return_correlation_2020, ax=axes[4, 1]);
No description has been provided for this image

Inferences¶

The first plot for the year 2011 shows that there is high correlation among the stocks. It shows that since in 2011 there was a market crash and there was volatility in the market, the prices of the stock went down along with the other stocks and this is the reason for high correlation.

Similarly in 2012, 2014 and 2017 the market is kind of stable, and hence the correlation among stocks is low.

In 2020, due to the COVID pandemic and the volatility in the market, the prices of the stock went down or up along with other stocks, and this is the reason for high correlation.

From this we can infer that, In stable market conditions, correlation matrices have low correlation values whereas in critical market conditions, correlation matrices have high correlation values.

Creating Graphs¶

In [57]:
graph_2011 = nx.Graph(return_correlation_2011)
In [58]:
figure = plt.figure(figsize=(22, 10))
nx.draw_networkx(graph_2011, with_labels=False)
No description has been provided for this image

This is a fully connected network as we created it using the correlation matrix.

A fully connected network means every variable has connections with all the other variables in the network and will also have self-loops.

Filtering Graphs using MST¶

MST - Minimum Spanning Tree

A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight.That is, it is a spanning tree whose sum of edge weights is as small as possible.

MST is one of the popular techniques to eliminate the redundancies and noise and meanwhile maintain the significant links in the network.

While removing redundancy and noise in the data using MST, we might lose some information as well.

You can find more on MST here

In [59]:
distance_2011 = np.sqrt(2 * (1 - return_correlation_2011))
distance_2012 = np.sqrt(2 * (1 - return_correlation_2012))
distance_2013 = np.sqrt(2 * (1 - return_correlation_2013))
distance_2014 = np.sqrt(2 * (1 - return_correlation_2014))
distance_2015 = np.sqrt(2 * (1 - return_correlation_2015))
distance_2016 = np.sqrt(2 * (1 - return_correlation_2016))
distance_2017 = np.sqrt(2 * (1 - return_correlation_2017))
distance_2018 = np.sqrt(2 * (1 - return_correlation_2018))
distance_2019 = np.sqrt(2 * (1 - return_correlation_2019))
distance_2020 = np.sqrt(2 * (1 - return_correlation_2020))

Before the construction of the MST graph, the correlation coefficient is converted into a distance.

In [60]:
distance_2011_graph = nx.Graph(distance_2011)
distance_2012_graph = nx.Graph(distance_2012)
distance_2013_graph = nx.Graph(distance_2013)
distance_2014_graph = nx.Graph(distance_2014)
distance_2015_graph = nx.Graph(distance_2015)
distance_2016_graph = nx.Graph(distance_2016)
distance_2017_graph = nx.Graph(distance_2017)
distance_2018_graph = nx.Graph(distance_2018)
distance_2019_graph = nx.Graph(distance_2019)
distance_2020_graph = nx.Graph(distance_2020)
In [61]:
graph_2011_filtered = nx.minimum_spanning_tree(distance_2011_graph)
graph_2012_filtered = nx.minimum_spanning_tree(distance_2012_graph)
graph_2013_filtered = nx.minimum_spanning_tree(distance_2013_graph)
graph_2014_filtered = nx.minimum_spanning_tree(distance_2014_graph)
graph_2015_filtered = nx.minimum_spanning_tree(distance_2015_graph)
graph_2016_filtered = nx.minimum_spanning_tree(distance_2016_graph)
graph_2017_filtered = nx.minimum_spanning_tree(distance_2017_graph)
graph_2018_filtered = nx.minimum_spanning_tree(distance_2018_graph)
graph_2019_filtered = nx.minimum_spanning_tree(distance_2019_graph)
graph_2020_filtered = nx.minimum_spanning_tree(distance_2020_graph)

We choose the MST method to filter out the network graph in each window so as to eliminate the redundancies and noise, and still maintain significant links.

In [24]:
figure, axes = plt.subplots(10, 1, figsize=(24, 120))
nx.draw_networkx(graph_2011_filtered, with_labels=False, ax=axes[0])
nx.draw_networkx(graph_2012_filtered, with_labels=False, ax=axes[1])
nx.draw_networkx(graph_2013_filtered, with_labels=False, ax=axes[2])
nx.draw_networkx(graph_2014_filtered, with_labels=False, ax=axes[3])
nx.draw_networkx(graph_2015_filtered, with_labels=False, ax=axes[4])
nx.draw_networkx(graph_2016_filtered, with_labels=False, ax=axes[5])
nx.draw_networkx(graph_2017_filtered, with_labels=False, ax=axes[6])
nx.draw_networkx(graph_2018_filtered, with_labels=False, ax=axes[7])
nx.draw_networkx(graph_2019_filtered, with_labels=False, ax=axes[8])
nx.draw_networkx(graph_2020_filtered, with_labels=False, ax=axes[9])
No description has been provided for this image

On plotting the graphs, we see that the network looks different every year, and no two yearwise graphs look very similar.

Computing Graph Statistics over Time¶

In [62]:
# Are stocks more or less aggregated
average_degree_connectivity = []
average_shortest_path_length = []
year = [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]

for graph in [graph_2011_filtered, graph_2012_filtered, graph_2013_filtered, graph_2014_filtered, graph_2015_filtered,
             graph_2016_filtered, graph_2017_filtered, graph_2018_filtered, graph_2019_filtered, graph_2020_filtered]:
    average_shortest_path_length.append(nx.average_shortest_path_length(graph))
In [63]:
figure = plt.figure(figsize=(22, 8))
sns.lineplot(x='year', y='average_shortest_path_length',
             data=pd.DataFrame({'year': year, 'average_shortest_path_length': average_shortest_path_length}));
No description has been provided for this image

From the above plot we can see that the shortest path length was more stable till 2015 but there was significant increment in 2016 and 2017 and again there was a decrement in 2018. In 2020 there an increment again.

Portfolio Construction¶

In [65]:
log_returns_2011_till_2020 = np.log(price_data_cleaned.shift(1)) - np.log(price_data_cleaned)
return_correlation_2011_till_2020 = log_returns_2011_till_2020.corr()
In [66]:
figure = plt.figure(figsize=(24, 8))
sns.heatmap(return_correlation_2011_till_2020);
No description has been provided for this image
In [67]:
distance_2011_till_2020 = np.sqrt(2 * (1 - return_correlation_2011_till_2020))
distance_2011_till_2020_graph = nx.Graph(distance_2011_till_2020)
distance_2011_till_2020_graph_filtered = nx.minimum_spanning_tree(distance_2011_till_2020_graph)
In [68]:
figure = plt.figure(figsize=(24, 8))
nx.draw_kamada_kawai(distance_2011_till_2020_graph_filtered, with_labels=False)
No description has been provided for this image
In [71]:
degree_centrality = nx.degree_centrality(distance_2011_till_2020_graph_filtered)
closeness_centrality = nx.closeness_centrality(distance_2011_till_2020_graph_filtered)
betweenness_centrality = nx.betweenness_centrality(distance_2011_till_2020_graph_filtered)
eigenvector_centrality=nx.eigenvector_centrality_numpy(distance_2011_till_2020_graph_filtered)
In [72]:
keys = []
values = []

for key, value in degree_centrality.items():
    keys.append(key)
    values.append(value)

dc_data = pd.DataFrame({'stocks': keys, 'degree_centrality': values}).sort_values('degree_centrality', ascending=False)
px.bar(data_frame=dc_data, x='stocks', y='degree_centrality', template='plotly_dark')

The bar chart ranks the degree centrality scores of all stocks in the network from 2011 to 2020. The steep initial drop-off shows that only a few stocks maintain strong and widespread correlations, while the majority have relatively weak or specialized connections.

At the top of the ranking, HON (Honeywell International) exhibits the highest degree centrality, meaning it shares significant correlations with the largest number of other stocks in the dataset. This suggests that Honeywell acted as a central connector in the market network during the 2011–2020 period.

This could reflect its diversified business model — spanning industrials, aerospace, and technology — allowing it to be sensitive to and co-move with broader market trends. In other words, HON’s performance was highly interconnected with the general market dynamics, making it a reliable indicator of systemic movements within that period.

Conclusion

Based on the degree centrality analysis:

HON (Honeywell International) emerges as a major hub in the stock correlation network between 2011 and 2020.

Its high number of connections suggests that Honeywell’s price movements were strongly synchronized with those of many other firms, highlighting its influential role in market-wide behavior.

The distribution of centrality scores also points to a core–periphery structure in the market: a few highly connected “core” stocks exert broad influence, while most others remain loosely connected in the periphery.

In [73]:
keys = []
values = []

for key, value in closeness_centrality.items():
    keys.append(key)
    values.append(value)

cc_data = pd.DataFrame({'stocks': keys, 'closeness_centrality': values}).sort_values('closeness_centrality',
                                                                                       ascending=False)
px.bar(data_frame=cc_data, x='stocks', y='closeness_centrality', template='plotly_dark')

Closeness centrality also involves the shortest path between all possible pairs of stocks on a network.

It is defined as the average number of shortest paths between a stock and all other stocks reachable from it.

In [74]:
keys = []
values = []

for key, value in betweenness_centrality.items():
    keys.append(key)
    values.append(value)

bc_data = pd.DataFrame({'stocks': keys, 'betweenness_centrality': values}).sort_values('betweenness_centrality',
                                                                                       ascending=False)
px.bar(data_frame=bc_data, x='stocks', y='betweenness_centrality', template='plotly_dark')

Betweenness centrality is the sum of the fraction of all possible shortest paths between any stocks that pass through a stock. It is used to quantify the control of a stock on information flow in the network.

So, the stock with the highest score is considered a significant stock in terms of its role in coordinating the information among stocks.

Preliminary Observations¶

Between 2011 and 2020, network analysis of stock correlations reveals that Honeywell (HON) held the highest degree centrality, meaning it was directly connected to the largest number of other stocks and strongly mirrored overall market movements. In contrast, Pentair (PNR), Emerson Electric (EMR), and Danaher (DHR) exhibited the highest closeness centrality, positioning them at the structural core of the market where they could efficiently influence or reflect broader trends through short, indirect connections. Finally, Procter & Gamble (PG), Colgate-Palmolive (CL), and General Dynamics (GD) ranked highest in betweenness centrality, acting as key bridges that link different market sectors and facilitate the flow of information across the network. From a portfolio optimization perspective, this suggests combining central, diversified industrial leaders like HON, EMR, and DHR for exposure to general market dynamics with connector stocks like PG and CL to capture cross-sector influence, while balancing them with lower-centrality stocks to mitigate systemic risk.

Selecting Stocks based on Network Topological Parameters¶

In [81]:
# we already computed degree centrality above

# we already computed betweenness centrality above

# distance on degree criterion
distance_degree_criteria = {}
node_with_largest_degree_centrality = max(dict(degree_centrality), key=dict(degree_centrality).get)
for node in distance_2011_till_2020_graph_filtered.nodes():
    distance_degree_criteria[node] = nx.shortest_path_length(distance_2011_till_2020_graph_filtered, node,
                                                             node_with_largest_degree_centrality)

# distance on correlation criterion
distance_correlation_criteria = {}
sum_correlation = {}

for node in distance_2011_till_2020_graph_filtered.nodes():
    neighbors = nx.neighbors(distance_2011_till_2020_graph_filtered, node)
    sum_correlation[node] = sum(return_correlation_2011_till_2020[node][neighbor] for neighbor in neighbors)

node_with_highest_correlation = max(sum_correlation, key=sum_correlation.get)

for node in distance_2011_till_2020_graph_filtered.nodes():
    distance_correlation_criteria[node] = nx.shortest_path_length(distance_2011_till_2020_graph_filtered, node,
                                                             node_with_highest_correlation)

# distance on distance criterion
distance_distance_criteria = {}
mean_distance = {}

for node in distance_2011_till_2020_graph_filtered.nodes():
    nodes = list(distance_2011_till_2020_graph_filtered.nodes())
    nodes.remove(node)
    distance_distance = [nx.shortest_path_length(distance_2011_till_2020_graph_filtered, node, ns) for ns in nodes]
    mean_distance[node] = np.mean(distance_distance)

node_with_minimum_mean_distance = min(mean_distance, key=mean_distance.get)

for node in distance_2011_till_2020_graph_filtered.nodes():
    distance_distance_criteria[node] = nx.shortest_path_length(distance_2011_till_2020_graph_filtered, node,
                                                             node_with_minimum_mean_distance)

Distance refers to the smallest length from a node to the central node of the network.

Here, three types of definitions of central node are introduced to reduce the error caused by a single method.

Therefore three types of distances are described here.

1. Distance on degree criterion (Ddegree), the central node is the one that has the largest degree.

2. Distance on correlation criterion (Dcorrelation), the central node is the one with the highest value of the sum of correlation coefficients with its neighbors.

3. Distance on distance criterion (Ddistance), the central node is the one that produces the lowest value for the mean distance.

In [82]:
node_stats = pd.DataFrame.from_dict(dict(degree_centrality), orient='index')
node_stats.columns = ['degree_centrality']
node_stats['betweenness_centrality'] = betweenness_centrality.values()

node_stats['average_centrality'] = 0.5 * (node_stats['degree_centrality'] + node_stats['betweenness_centrality'])

node_stats['distance_degree_criteria'] = distance_degree_criteria.values()
node_stats['distance_correlation_criteria'] = distance_correlation_criteria.values()
node_stats['distance_distance_criteria'] = distance_distance_criteria.values()
node_stats['average_distance'] = (node_stats['distance_degree_criteria'] + node_stats['distance_correlation_criteria'] +
                                  node_stats['distance_distance_criteria']) / 3
In [37]:
node_stats.head()
Out[37]:
degree_centrality betweenness_centrality average_centrality distance_degree_criteria distance_correlation_criteria distance_distance_criteria average_distance
MMM 0.002227 0.000000 0.001114 2 2 7 3.666667
AOS 0.004454 0.022073 0.013264 6 6 5 5.666667
ABT 0.008909 0.056584 0.032746 9 9 8 8.666667
ABMD 0.002227 0.000000 0.001114 10 10 9 9.666667
ACN 0.006682 0.008899 0.007790 16 16 9 13.666667

We use the parameters defined above to select the portfolios.

The nodes with the largest 10% of degree or betweenness centrality are chosen to be in the central portfolio.

The nodes whose degree equals to 1 or betweenness centrality equals to 0 are chosen to be in the peripheral portfolio.

Similarly, we define the node's ranking in the top 10% of distance as the stocks of the peripheral portfolios, and the bottom 10% as the stocks of the central portfolios.

The central portfolios and peripheral portfolios represent two opposite sides of correlation and agglomeration. Generally speaking, central stocks play a vital role in the market and impose a strong influence on other stocks. On the other hand, the correlations between peripheral stocks are weak and contain much more noise than those of the central stocks.

In [83]:
central_stocks = node_stats.sort_values('average_centrality', ascending=False).head(15)
central_portfolio = [stock for stock in central_stocks.index.values]
In [84]:
peripheral_stocks = node_stats.sort_values('average_distance', ascending=False).head(15)
peripheral_portfolio = [stock for stock in peripheral_stocks.index.values]
In [85]:
central_stocks
Out[85]:
degree_centrality betweenness_centrality average_centrality distance_degree_criteria distance_correlation_criteria distance_distance_criteria average_distance
PRU 0.011136 0.639884 0.325510 7 7 0 4.666667
AMP 0.028953 0.540198 0.284576 5 5 2 4.000000
LNC 0.015590 0.526050 0.270820 6 6 1 4.333333
AME 0.020045 0.517430 0.268737 4 4 3 3.666667
GL 0.017817 0.452504 0.235160 8 8 1 5.666667
PH 0.031180 0.389924 0.210552 2 2 5 3.000000
EMR 0.006682 0.414403 0.210542 3 3 4 3.333333
TFC 0.008909 0.400791 0.204850 10 10 3 7.666667
USB 0.006682 0.391624 0.199153 9 9 2 6.666667
PNC 0.008909 0.353911 0.181410 11 11 4 8.666667
JPM 0.011136 0.351078 0.181107 12 12 5 9.666667
PFG 0.011136 0.345023 0.178079 8 8 1 5.666667
BRK-B 0.008909 0.332216 0.170563 13 13 6 10.666667
HST 0.006682 0.331202 0.168942 9 9 2 6.666667
ADP 0.011136 0.312888 0.162012 14 14 7 11.666667
In [86]:
peripheral_stocks
Out[86]:
degree_centrality betweenness_centrality average_centrality distance_degree_criteria distance_correlation_criteria distance_distance_criteria average_distance
CHD 0.002227 0.000000 0.001114 24 24 17 21.666667
CLX 0.004454 0.004454 0.004454 23 23 16 20.666667
CAG 0.002227 0.000000 0.001114 23 23 16 20.666667
CPB 0.004454 0.004454 0.004454 22 22 15 19.666667
K 0.002227 0.000000 0.001114 22 22 15 19.666667
HRL 0.002227 0.000000 0.001114 22 22 15 19.666667
SJM 0.002227 0.000000 0.001114 22 22 15 19.666667
KMB 0.004454 0.008889 0.006672 22 22 15 19.666667
ATO 0.002227 0.000000 0.001114 21 21 14 18.666667
MNST 0.002227 0.000000 0.001114 21 21 14 18.666667
GIS 0.011136 0.022162 0.016649 21 21 14 18.666667
WBA 0.002227 0.000000 0.001114 21 21 14 18.666667
CL 0.004454 0.013303 0.008879 21 21 14 18.666667
EVRG 0.002227 0.000000 0.001114 21 21 14 18.666667
ABC 0.002227 0.000000 0.001114 21 21 14 18.666667

Selecting the top 15 stocks for both Central Stocks and Peripheral Stocks¶

In [87]:
color = []

for node in distance_2011_till_2020_graph_filtered:
    if node in central_portfolio:
        color.append('red')

    elif node in peripheral_portfolio:
        color.append('green')

    else:
        color.append('blue')
In [88]:
figure = plt.figure(figsize=(24, 8))
nx.draw_kamada_kawai(distance_2011_till_2020_graph_filtered, with_labels=False, node_color=color)
No description has been provided for this image

Here, the red stocks are the central portfolio stocks, and the green ones are the peripheral portfolio stocks.

Performance Evalutation¶

Here we evaluate the performance of the stocks by comparing the performance of the Central Portfolio, Peripheral and S&P 500 Stocks in 2021, and finding out which portfolio performs the best.

In [89]:
# collecting data for all S&P 500 components for the year 2021
# %time price_data_2021 = web.DataReader(tickers, 'yahoo', start='2021-01-01', end='2021-12-31')
In [91]:
#Reading data for 2021 S&P 500 stocks:
#price_data_2021 = pd.read_csv('snp500_price_data_2021.csv', index_col=[0])
# price_data_2021 = price_data_2021['Adj Close']
# price_data_2021.to_csv('snp500_price_data_2021.csv')
In [94]:
# Build path to file saved in Data fodler
main_dir = os.path.dirname(current_path)
file_path = os.path.join(main_dir, "Data", "snp500_price_data_2021.csv")

price_data_2021 = pd.read_csv(file_path, index_col=[0])

print(price_data_2021.head())
                   MMM        AOS         ABT        ABBV        ABMD  \
Date                                                                    
2020-12-31  169.412521  53.700413  107.444366  101.195663  324.200012   
2021-01-04  166.582382  52.818787  107.071472   99.552361  316.730011   
2021-01-05  166.301315  53.161644  108.396248  100.581787  322.600006   
2021-01-06  168.831009  54.993446  108.170555   99.712914  321.609985   
2021-01-07  164.498520  55.669361  109.220566  100.780106  323.559998   

                   ACN       ATVI        ADM        ADBE         ADP  ...  \
Date                                                                  ...   
2020-12-31  257.353546  92.402596  49.218819  500.119995  172.915405  ...   
2021-01-04  252.673630  89.466812  48.691574  485.339996  165.810379  ...   
2021-01-05  254.112091  90.253006  49.638653  485.690002  165.349136  ...   
2021-01-06  256.890442  87.575966  51.649975  466.309998  164.770126  ...   
2021-01-07  259.314148  89.237915  51.191086  477.739990  165.702438  ...   

                  WYNN        XEL        XLNX         XYL         YUM  \
Date                                                                    
2020-12-31  112.830002  64.846153  141.770004  100.827858  106.770256   
2021-01-04  106.900002  63.863796  142.429993   98.747719  104.075439   
2021-01-05  110.190002  63.241299  144.229996   98.628838  104.085266   
2021-01-06  110.849998  64.641907  141.220001  102.789139  104.655716   
2021-01-07  109.750000  63.377476  149.710007  107.454628  103.859055   

                  ZBRA         ZBH       ZION         ZTS  CEG  
Date                                                            
2020-12-31  384.329987  153.096832  42.901466  164.329178  NaN  
2021-01-04  378.130005  152.172821  42.397789  162.432663  NaN  
2021-01-05  380.570007  154.805740  43.069359  163.564590  NaN  
2021-01-06  394.820007  159.217133  47.908611  165.967484  NaN  
2021-01-07  409.100006  158.273254  49.370266  165.818558  NaN  

[5 rows x 505 columns]
In [95]:
price_data_2021.head()
Out[95]:
MMM AOS ABT ABBV ABMD ACN ATVI ADM ADBE ADP ... WYNN XEL XLNX XYL YUM ZBRA ZBH ZION ZTS CEG
Date
2020-12-31 169.412521 53.700413 107.444366 101.195663 324.200012 257.353546 92.402596 49.218819 500.119995 172.915405 ... 112.830002 64.846153 141.770004 100.827858 106.770256 384.329987 153.096832 42.901466 164.329178 NaN
2021-01-04 166.582382 52.818787 107.071472 99.552361 316.730011 252.673630 89.466812 48.691574 485.339996 165.810379 ... 106.900002 63.863796 142.429993 98.747719 104.075439 378.130005 152.172821 42.397789 162.432663 NaN
2021-01-05 166.301315 53.161644 108.396248 100.581787 322.600006 254.112091 90.253006 49.638653 485.690002 165.349136 ... 110.190002 63.241299 144.229996 98.628838 104.085266 380.570007 154.805740 43.069359 163.564590 NaN
2021-01-06 168.831009 54.993446 108.170555 99.712914 321.609985 256.890442 87.575966 51.649975 466.309998 164.770126 ... 110.849998 64.641907 141.220001 102.789139 104.655716 394.820007 159.217133 47.908611 165.967484 NaN
2021-01-07 164.498520 55.669361 109.220566 100.780106 323.559998 259.314148 89.237915 51.191086 477.739990 165.702438 ... 109.750000 63.377476 149.710007 107.454628 103.859055 409.100006 158.273254 49.370266 165.818558 NaN

5 rows × 505 columns

In [99]:
snp_500_2021 = web.DataReader(['sp500'], 'fred', start='2021-01-01', end='2021-12-31')
In [96]:
price_data_2021.head()
Out[96]:
MMM AOS ABT ABBV ABMD ACN ATVI ADM ADBE ADP ... WYNN XEL XLNX XYL YUM ZBRA ZBH ZION ZTS CEG
Date
2020-12-31 169.412521 53.700413 107.444366 101.195663 324.200012 257.353546 92.402596 49.218819 500.119995 172.915405 ... 112.830002 64.846153 141.770004 100.827858 106.770256 384.329987 153.096832 42.901466 164.329178 NaN
2021-01-04 166.582382 52.818787 107.071472 99.552361 316.730011 252.673630 89.466812 48.691574 485.339996 165.810379 ... 106.900002 63.863796 142.429993 98.747719 104.075439 378.130005 152.172821 42.397789 162.432663 NaN
2021-01-05 166.301315 53.161644 108.396248 100.581787 322.600006 254.112091 90.253006 49.638653 485.690002 165.349136 ... 110.190002 63.241299 144.229996 98.628838 104.085266 380.570007 154.805740 43.069359 163.564590 NaN
2021-01-06 168.831009 54.993446 108.170555 99.712914 321.609985 256.890442 87.575966 51.649975 466.309998 164.770126 ... 110.849998 64.641907 141.220001 102.789139 104.655716 394.820007 159.217133 47.908611 165.967484 NaN
2021-01-07 164.498520 55.669361 109.220566 100.780106 323.559998 259.314148 89.237915 51.191086 477.739990 165.702438 ... 109.750000 63.377476 149.710007 107.454628 103.859055 409.100006 158.273254 49.370266 165.818558 NaN

5 rows × 505 columns

In [100]:
# Removing NA values:
price_data_2021 = price_data_2021.dropna(axis=1)
snp_500_2021 = snp_500_2021.dropna()
In [101]:
price_data_2021.head()
Out[101]:
MMM AOS ABT ABBV ABMD ACN ATVI ADM ADBE ADP ... WTW WYNN XEL XLNX XYL YUM ZBRA ZBH ZION ZTS
Date
2020-12-31 169.412521 53.700413 107.444366 101.195663 324.200012 257.353546 92.402596 49.218819 500.119995 172.915405 ... 210.679993 112.830002 64.846153 141.770004 100.827858 106.770256 384.329987 153.096832 42.901466 164.329178
2021-01-04 166.582382 52.818787 107.071472 99.552361 316.730011 252.673630 89.466812 48.691574 485.339996 165.810379 ... 203.699997 106.900002 63.863796 142.429993 98.747719 104.075439 378.130005 152.172821 42.397789 162.432663
2021-01-05 166.301315 53.161644 108.396248 100.581787 322.600006 254.112091 90.253006 49.638653 485.690002 165.349136 ... 202.000000 110.190002 63.241299 144.229996 98.628838 104.085266 380.570007 154.805740 43.069359 163.564590
2021-01-06 168.831009 54.993446 108.170555 99.712914 321.609985 256.890442 87.575966 51.649975 466.309998 164.770126 ... 203.699997 110.849998 64.641907 141.220001 102.789139 104.655716 394.820007 159.217133 47.908611 165.967484
2021-01-07 164.498520 55.669361 109.220566 100.780106 323.559998 259.314148 89.237915 51.191086 477.739990 165.702438 ... 205.250000 109.750000 63.377476 149.710007 107.454628 103.859055 409.100006 158.273254 49.370266 165.818558

5 rows × 503 columns

In [102]:
price_data_2021 = price_data_2021['2021-01-04':]
In [103]:
amount = 100000

central_portfolio_value = pd.DataFrame()
for stock in central_portfolio:
    central_portfolio_value[stock] = price_data_2021[stock]

portfolio_unit = central_portfolio_value.sum(axis=1)[0]
share = amount / portfolio_unit
central_portfolio_value = central_portfolio_value.sum(axis=1) * share

peripheral_portfolio_value = pd.DataFrame()
for stock in peripheral_portfolio:
    peripheral_portfolio_value[stock] = price_data_2021[stock]

portfolio_unit = peripheral_portfolio_value.sum(axis=1)[0]
share = amount / portfolio_unit
peripheral_portfolio_value = peripheral_portfolio_value.sum(axis=1) * share
In [104]:
snp_500_2021_value = snp_500_2021 * (amount / snp_500_2021.iloc[0])
In [105]:
all_portfolios = snp_500_2021_value
all_portfolios['central_portfolio'] = central_portfolio_value.values
all_portfolios['peripheral_portfolio'] = peripheral_portfolio_value.values
In [106]:
# all_portfolios = pd.concat([snp_500_2021_value, central_portfolio_value, peripheral_portfolio_value], axis=1)
# all_portfolios.columns = ['snp500', 'central_portfolio', 'peripheral_portfolio']
In [107]:
all_portfolios.head()
Out[107]:
sp500 central_portfolio peripheral_portfolio
DATE
2021-01-04 100000.000000 100000.000000 100000.000000
2021-01-05 100708.253955 100426.138652 100062.699137
2021-01-06 101283.288071 104249.598589 100581.162028
2021-01-07 102787.077946 105184.349194 100174.816809
2021-01-08 103351.573372 105127.033059 100190.990005
In [108]:
figure, ax = plt.subplots(figsize=(16, 8))
snp_500_line = ax.plot(all_portfolios['sp500'], label='S&P 500')
central_portfolio_line = ax.plot(all_portfolios['central_portfolio'], label= 'Central Portfolio')
peripheral_portfolio_line = ax.plot(all_portfolios['peripheral_portfolio'], label= 'Peripheral Portfolio')
ax.legend(loc='upper left')
plt.show()
No description has been provided for this image

As seen from the above plot, it is clear that the Central Portfolio stocks perform better and the Peripheral Portfolio stocks perform poorer in comparison to the S&P 500 stocks in 2021.

Both the portfolios have their own features under different market conditions.

Generally, in stable market conditions Central Portfolio Stocks will perform better whereas Peripheral Portfolio Stocks will perform better in crisis market conditions. This is due to peripheral portfolio stocks are kind of having a weak correlation so they will not be impacted by all other stocks that were present in our network.

We can rebalance our stocks portfolio by using the network analysis.

CONCLUSIONS¶

The results show that the central portfolio — composed of highly connected and influential stocks such as Honeywell (HON), Emerson Electric (EMR), and Danaher (DHR) — significantly outperformed both the peripheral portfolio and the S&P 500 over the 2011–2020 period. This superior performance highlights how stocks occupying structurally central positions in the market network tend to capture systemic growth and benefit from broader economic momentum, effectively acting as “market amplifiers.” In contrast, the peripheral portfolio, while offering some diversification, lagged behind due to its weaker ties to overall market trends.

A good strategy could be to overweighting central stocks within their equity allocation to capitalize on their proven ability to outperform the market, especially during expansionary phases. However, I would still include a select portion of peripheral or low-centrality stocks to provide protection and stability during downturns or sector-specific shocks. This balanced approach would allow clients to benefit from the strong growth potential of central market players while maintaining risk diversification and long-term portfolio resilience.

In [ ]: